home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Skunkware 5
/
Skunkware 5.iso
/
lib
/
linuxdoc-sgml
/
doc
/
guide.txt
< prev
next >
Wrap
Text File
|
1994-06-21
|
30KB
|
859 lines
Linuxdoc-SGML User's Guide
Matt Welsh, mdw@sunsite.unc.edu
v1.3, 7 June 1994
This document is a user's guide to the linuxdoc-sgml formatting sys-
tem, an SGML-based text formatter which allows you to produce LaTeX,
plain ASCII, and HTML from a single source format. This guide docu-
ments Linuxdoc-SGML version 1.1.
1. Introduction
This is a user's guide to the linuxdoc-sgml document processing
system, for use with Linux documentation. linuxdoc-sgml is an SGML DTD
(Document Type Definition) and set of ``replacement files'' which
convert the SGML to groff, LaTeX, and HTML source. In the future,
linuxdoc-sgml will support texinfo, as well as other formats.
linuxdoc-sgml is based heavily on the QWERTZ DTD by Tom Gordon,
thomas.gordon@gmd.de. I have only made revisions to his DTD and
replacement files for use by Linux documentation.
linuxdoc-sgml is not meant to be a general document-processing system.
Although it can be used for documents of many types, I have tailored
it for use by the Linux documentors in producing HOWTOs, FAQs, and
(later) the Linux Documentation Project manuals. Therefore, I have
tweaked features into and out of the system for this purpose. If you
see a lack of generality in the system, that is the reason. There's
nothing binding linuxdoc-sgml to Linux documentation, but all
documents produced by the system will look a certain way. If you want
things to look differently I suggest that you use a more generalized
system such as the plain QWERTZ DTD.
One of the goals of this system is to make documents easy to produce
in numerous formats. Until now, most Linux documentation has been
produced in plain ASCII through manual editing. A system like groff
can take care of the plain-text formatting, but that still doesn't
give you HTML (for use on the World Wide Web), LaTeX (for nicely
printed documents), or texinfo. Therefore, if there are features
missing from this system that you would like, please let me know! The
idea is that we shouldn't have to use a lot of hackery to produce
good-looking docs in multiple formats. The author should have to do
as little as possible.
1.1. About this document
This document is written using the linuxdoc-sgml DTD. It contains more
or less everything you need to know to write SGML docs with this DTD.
See example.sgml for an example of an SGML document that you can use
as a model for your own docs.
1.2. Why SGML?
I chose SGML for this system because SGML is made specifically for
translation to other formats. SGML, which stands for Standard
Generalized Markup Language, allows you to specify the structure of a
document---that is, what kinds of things make up the document. You
specify the structure of a document with a DTD (Document Type
Definition). linuxdoc-sgml is one DTD that specifies the structure for
Linux HOWTOs and other docs. QWERTZ is another DTD; the SGML standard
provides DTD's for books, articles, and other generic document types.
The DTD specifies the names of ``elements'' within the document. An
element is just a bit of structure---like a section, a subsection, a
paragraph, or even something smaller like emphasised text. Unlike
LaTeX, however, these elements are not in any way intrinsic to SGML
itself. The linuxdoc-sgml DTD happens to define elements that look a
lot like their LaTeX counterparts---you have sections, subsections,
verbatim ``environments'', and so forth. However, using SGML you can
define any kind of structure for the document that you like. In a way,
SGML is like low-level TeX, while the linuxdoc-sgml DTD is like LaTeX.
Don't be confused by this analogy. SGML is not a text-formatting
system. There is no ``SGML formatter'' per se. SGML source is only
converted to other formats for processing. Furthermore, SGML itself is
used only to specify the document structure. There are no text-
formatting facilities or ``macros'' intrinsic to SGML itself. All of
those things are defined within the DTD. You can't use SGML without a
DTD---a DTD defines what SGML does.
1.3. How it works
Here's how processing a document with SGML and the linuxdoc-sgml DTD
works. First, you need a DTD. I'm using the QWERTZ DTD which was
produced, originally, by a group of people who needed a LaTeX-like
DTD. I've modified the QWERTZ DTD to produce the linuxdoc-sgml DTD for
our purposes. The DTD simply sets up the structure of the document. A
small portion of it looks like this:
<!element article - -
(titlepag, header?,
toc?, lof?, lot?, p*, sect*,
(appendix, sect+)?, biblio?) +(footnote)>
This part sets up the overall structure for an ``article'', which is
like a ``documentstyle'' within LaTeX. The article consists of a
titlepage (titlepag), an optional header (header), an optional table
of contents (toc), optional lists of figures (lof) and tables (lot),
any number of paragraphs (p), any number of top-level sections (sect),
optional appendices (appendix), an optional bibliography (biblio) and
footnotes (footnote).
As you can see, the DTD doesn't say anything about how the document
should be formatted or what it should look like. It just defines what
parts make up the document. Elsewhere in the DTD the structure of the
titlepag, header, sect, and other elements are defined.
You don't need to know anything about the syntax of the DTD in order
to write documents. I'm just presenting it so you know what it looks
like and what it does. You do need to be familiar with the document
structure that the DTD defines. If not, you might violate the
structure when attempting to write a document, and be very confused
about the resulting error messages. We'll describe the structure of
linuxdoc-sgml documents in detail later.
The next step is to write a document using the structure defined by
the DTD. Again, the linuxdoc-sgml DTD makes documents look a lot like
LaTeX---it's very easy to follow. In SGML jargon a single document
written using a particular DTD is known as an ``instance'' of that
DTD.
In order to translate the SGML source into another format (such as
LaTeX or nroff) for processing, the SGML source (the document that you
wrote) is parsed along with the DTD by (you guessed it) the SGML
parser. I'm using the sgmls parser by James Clark, jjc@jclark.com,
who also happens to be the author of groff. We're in good hands. The
parser (the executable sgmls simply picks through your document and
verifies that it follows the structure set forth by the DTD. It also
spits out a more explicit form of your document, with all ``macros''
and elements expanded, which is understood by sgmlsasp, the next part
of the process.
sgmlsasp is responsible for converting the output of sgmls to another
format (such as LaTeX). It does this using replacement files, which
describe how to convert elements in the original SGML document into
corresponding source in the ``target'' format (such as LaTeX or
nroff).
For example, part of the replacement file for LaTeX looks like:
<itemize> + "\\begin{itemize}" +
</itemize> + "\\end{itemize}" +
Which says that whenever you begin an itemize element in the SGML
source, it should be replaced with
\begin{itemize}
in the LaTeX source. (As I said, elements in the linuxdoc-sgml DTD are
very similar to their LaTeX counterparts).
So, to convert the SGML to another format, all you have to do is write
a new replacement file for that format that gives the appropriate
analogues to the SGML elements in that new format. In practice, it's
not that simple---for example, if you're trying to convert to a format
that isn't structured at all like your DTD, you're going to have
trouble. In any case, it's much easier to do than writing individual
parsers and translators for many kinds of output formats; SGML
provides a generalized system for converting one source to many
formats.
Once sgmlsasp has completed its work, you have LaTeX source which
corresponds to your original SGML document, which you can format using
LaTeX as you normally would. Later in this document I'll give examples
and show the commands used to do the translation and formatting. You
can do this all on one command line.
But first, I should describe how to install and configure the
software.
2. Installation
The file linuxdoc-sgml.tar.gz contains everything that you need to
write SGML documents and convert them to LaTeX, nroff, and HTML. In
addition to this package, you will need one or both of the following:
1. groff. You need version 1.08 or 1.09. Apparently some of the
margin-handling in groff is in a state of flux from version to
version; they both work, but you get slightly different results.
(Particularly, with 1.09 the left margin isn't indented two
characters as it is in 1.08. There is a way around it, but it looks
terrible on 1.08. Versions previous to 1.08 will not work. You
can get this from prep.ai.mit.edu in /pub/gnu. There is a Linux
binary version on sunsite as well. You will need groff to produce
plain ASCII from your SGML docs. (TeX/LaTeX will be used to
produce nicely-printed PostScript and .dvi).
2. TeX and LaTeX. This is available more or less everywhere; you
should have no problem getting it and installing it (there is a
Linux binary distribution on sunsite). Of course, you only need
TeX/LaTeX if you want to format your SGML docs with LaTeX. So,
installing TeX/LaTeX is optional. See the section on the Linux
HOWTO project below for how we'll manage this vis-a-vis the Linux
HOWTOs.
3. If you want to view the generated HTML, I suggest getting NCSA
Mosaic 2.2 or later.
Neither of these are required by the SGML system, but I suggest
that you get one or the other in order to format your docs and
verify that they look all right before distributing them.
2.1. Installing the software
The steps needed to install and configure the linuxdoc-sgml stuff are
as follows:
1. First, unpack the tar file linuxdoc-sgml.tar.gz somewhere. This
will create the directory linuxdoc-sgml where all of the SGML files
live. It doesn't matter where you unpack this file; just don't move
things around within the linuxdoc-sgml directory.
2. Next, you need to compile the sgmls parser. In the linuxdoc-
sgml/sgmls-1.1 directory, issue the commands:
$ make config.h
$ make
$ make install
$ make install.man
This should compile the parser and translator, and place the binaries
sgmls, sgmlsasp, and rast in linuxdoc-sgml/bin. I suggest that you
don't move those binaries from that location; instead, make symlinks
to them from /usr/local/bin or place linuxdoc-sgml/bin on your path.
(If you move things around within the linuxdoc-sgml tree you'll have
to edit a number of files to get everything to cooperate again. Best
to leave things as-is.)
If things don't work try editing the Makefile in the sgmls-1.1 direc-
tory. I have it set to use gcc as the compiler, and use rather malig-
nant options. Compiles fine on Linux and sun-4 systems.
This will also install man pages for the three binaries in linuxdoc-
sgml/man. You can move those or link them to your regular man page
tree, should you need them.
3. Edit the variables at the top of the scripts format, qroff,
preroff, prehtml, and qtex in linuxdoc-sgml/bin. All you really
need to edit is the value of the LINUXDOC shell variable which
gives the full pathname of the linuxdoc-sgml directory.
4. In the html-fix directory, issue the commands:
$ make
$ make install
This will build fixref and html2html, which are post processors for
the HTML conversion, and place them in the bin directory.
If all went well, you should be ready to use the system. Just be sure
that linuxdoc-sgml/bin is on your path or you've linked the files
therein to your standard binary directories. Again, don't just copy
them somewhere else; the scripts expect to find each other in that
directory.
2.2. Testing it out
You can now test the system. The format script takes an SGML document
as input and translates it to a given format. The qtex script will
process the output of format using LaTeX, and qroff will process it
using nroff.
Let's say you have the SGML document foo.sgml. You can translate it to
LaTeX, and produce PostScript output (via dvips) with the command:
$ format -Tlatex foo | qtex > foo.ps
Or, you can produce a DVI file using the -d switch with qtex, as so:
$ format -Tlatex foo | qtex -d > foo.dvi
If you want to produce plain ASCII, through groff, use the command:
$ format -Tnroff foo | qroff > foo.txt
Note that I have tailored the groff conversion for plain ASCII output.
(That is, I've removed page headers, page numbers, changed the mar-
gins, and so on.) With some hacking you can produce PostScript and DVI
from the groff resulting from format, but I suggest that you use LaTeX
for that instead.
If you want to produce HTML, the procedure is a bit more complicated,
because of cross-references. Here's an example:
$ format -Thtml foo.sgml | prehtml | fixref > tmp.html
$ format -Thtml foo.sgml | prehtml >> tmp.html
$ cat tmp.html | html2html foo > foo.html
$ rm tmp.html
This will produce foo.html, as well as foo-1.html, foo-2.html, and so
on---one file for each section of the document. Run your WWW client
on foo.html, which is the toplevel file. Also make sure that all of
the HTML files corresponding to your document are in one directory, as
they reference each other with local URLs.
A good way to test this would be to run it on this file, guide.sgml.
If you just want to capture your errors from the SGML conversion, use
something like
$ format -Tnroff foo > /dev/null
2.3. Development note
The HTML conversion is, at this time, rudimentary but adequate. In the
future there will be support for cross-references, navigation buttons,
external URLs, and the like. Something is better than nothing. :)
Also, if you'd like to help me implement a texinfo (or plain Info)
conversion for Linuxdoc-SGML, let me know! As with HTML we'll have to
do some pre- and post-processing (which you supposedly shouldn't need
with SGML, ah well), but that's not a big issue.
3. Writing Documents with linuxdoc-sgml
For the most part, writing documents using the linuxdoc DTD is very
simple, and somewhat like LaTeX. However, there are some caveats to
watch out for. In this section I'll give an introduction on writing
SGML docs. See the file example.sgml for an SGML example document
(and tutorial) which you can use as a model when writing your own
docs. Here I'm just going to discuss the various features of SGML, but
the source is not very readable as an example. Instead, print out the
source (as well as the formatted output) for example.sgml so you have
a real live case to refer to.
3.1. Basic concepts
Looking at the source of the example document, you'll notice right off
that there are a number of ``tags'' marked within angle brackets (<
and >). A tag simply specifies the beginning or end of an element,
where an element is something like a section, a paragraph, a phrase of
italicized text, an item in a list, and so on. Using a tag is like
using a LaTeX command such as \item or \section{...}.
As a simple example, to produce this boldfaced text, I typed
As a simple example, to produce <bf>this boldfaced text</bf>, ...
in the source. <bf> begins the region of bold text, and </bf> ends it.
Alternately, use can use the abbreviated form
As a simple example, to produce <bf/this boldfaced text/, ...
which encloses the bold text within slashes. (Of course, you'll need
to use the long form if the enclosed text contains slashes, such as
the case with UNIX filenames).
There are other things to watch out with respect to special characters
(that's why you'll notice all of these bizarre-looking ampersand
expressions if you look at the source; I'll talk about those shortly).
In some cases, the end-tag for a particular element is optional. For
example, to begin a section, you use the <sect> tag, however, the end-
tag for the section (which could appear at the end of the section body
itself, not just after the name of the section!) is optional and
implied when you start another section of the same depth. In general
you needn't worry about these details; just follow the model used in
the tutorial (example.sgml), and feel free to ask me if you have any
questions about the particulars.
3.2. Special characters
Obviously, the angle brackets are themselves special characters in the
SGML source. There are others to watch out for. For example, let's say
that you wanted to type an expression with angle brackets around it,
as so: <foo>. In order to get the left angle bracket, you must use the
< element, which is a ``macro'' that expands to the actual left-
bracket character. Therefore, in the source, I typed
angle brackets around it, as so: <tt><foo></tt>.
Generally, something beginning with an ampersand is a special macro.
For example, there's &percnt to produce %, &verbar to produce |, and
so on. For all ``special characters'' there exist these ampersanded-
entities to represent them.
Usually, you don't need to use the ampersand macro to get a special
character, however, in some cases it is necessary. The most commonly
used are:
o Use & for the ampersand (&),
o Use < for a left bracket (<),
o Use > for a right bracket (>),
o Use &etago; for a left bracket with a slash (</)
o Use $ for a dollar sign ($),
o Use # for a hash (#),
o Use % for a percent (%),
o Use `` and '' for quotes, or use &dquot for ".
3.3. Verbatim and code environments
While we're on the subject of special characters, I might as well
mention the verbatim ``environment'' used for including literal text
in the output (with spaces and indentation preserved, and so on). The
verb element is used for this; it looks like the following:
<verb>
Some literal text to include as example output.
</verb>
The verb environment doesn't allow you to use everything within it
literally. Specifically, you must do the following within verb envi-
ronments.
o Use &ero; to get an ampersand,
o Use &etago; to get </,
o Don't use \end{verbatim} within a verb environment, as this is what
LaTeX uses to end the verbatim environment. (In the future, it
should be possible to hide the underlying text formatter entirely,
but the parser doesn't support this feature yet.)
The code environment is much just like the verb environment, except
that horizontal rules are added to the surrounding text, as so:
___________________________________________________________________
Here is an example code environment.
___________________________________________________________________
You should use the tscreen environment around any verb environments,
as so:
<tscreen><verb>
Here is some example text.
</verb></tscreen>
tscreen is an envionment that simply indents the text and sets the
sets the default font to tt. This makes examples look much nicer, both
in the LaTeX and plain ASCII versions. You can use tscreen without
verb, however, if you use any special characters in your example
you'll need to use both of them. tscreen does nothing to special char-
acters. See example.sgml for examples.
The quote environment is like tscreen, except that it does not set the
default font to tt. So, you can use quote for non-computer-interaction
quotes, as in:
<quote>
Here is some text to be indented, as in a quote.
</quote>
which will generate:
Here is some text to be indented, as in a quote.
3.4. Overall document structure
Before we get too in-depth with details, I'm going to describe the
overall structure of a document as defined by the linuxdoc DTD. Look
at example.sgml for a good example of how a document is set up.
3.4.1. The preamble
In the document ``preamble'' you set up things such as the title
information and document style. For a Linux HOWTO document this should
look like:
<!doctype linuxdoc system>
<article>
<title>The Linux Food-Processing HOWTO
<author>Norbert Ebersol, <tt/norbert@foo.com/
<date>v1.0, 9 March 1994
<abstract>
This document describes how to connect your Linux machine to a food-processor
for dicing vegetables.
</abstract>
<toc>
The elements should go more or less in this order. The first line
tells the SGML parser to use the linuxdoc DTD. The <article> tag
forces the document to use the ``article'' document style. (The
original QWERTZ DTD defines ``report'' and ``book'' as well; I haven't
tweaked these for use with linuxdoc-sgml. Just use article for you
SGML docs, for now.)
The title, author, and date tags should be obvious; in the date tag
include the version number and last modification time of the document.
Thr abstract tag sets up the text to be printed at the top of the
document, before the table of contents. If you're not going to include
a table of contents (the toc tag), you probably don't need an
abstract. I suggest that all Linux HOWTOs use this same format for the
preamble, so that the title, abstract, and table of contents are all
there and look the same.
3.4.2. Sectioning and paragraphs
After the preamble, you're ready to dive into the document. The
following sectioning commands are available:
o sect: For top-level sections (i.e. 1, 2, and so on.)
o sect1: For second-level subsections (i.e. 1.1, 1.2, and so on.)
o sect2: For third-level subsubsections.
o sect3: For fourth-level subsubsubsections.
o sect4: For fifth-level subsubsubsubsections.
These are roughly equivalent to their LaTeX counterparts section,
subsection, and so on.
After the sect (or sect1, sect2, etc.) tag comes the name of the
section. For example, at the top of this document, after the preamble,
comes the tag:
<sect>Introduction
And at the beginning of this section (Sectioning and paragraphs),
there is the tag:
<sect2>Sectioning and paragraphs
After the section tag, you begin the body of the section. However, you
must start the body with a <p> tag, as so:
<sect>Introduction
<p>
This is a user's guide to the <tt/linuxdoc-sgml/ document processing...
This is to tell the parser that you're done with the section title and
are ready to begin the body. Thereafter, new paragraphs are started
with a blank line (just as you would do in TeX). For example,
Here is the end of the first paragraph.
And we start a new paragraph here.
There is no reason to use <p> tags at the beginning of every para-
graph; only at the beginning of the first paragraph after a sectioning
command.
3.4.3. Ending the document
At the end of the document, you must use the tag:
</article>
to tell the parser that you're done with the article element (which
embodies the entire document).
3.5. Cross-references
Now we're going to move onto other features of the system. Cross-
references are easy. For example, if you want to make a cross-
reference to a certain section, you need to label that section as so:
<sect1><heading><label id="sec-intro">Introduction</>
You can then refer to that section somewhere in the text using the
expression:
See section <ref id="sec-intro" name="Introduction"> for an introduction.
This will replace the ref tag with the section number labelled as sec-
intro. The name argument to ref is necessary for nroff and HTML trans-
lations (at the moment). The nroff macro set used by Linuxdoc-SGML
does not currently support cross-references, and it's often nice to
refer to a section by name instead of number.
For example, this section is ``Cross-references''.
There is also a url element for Universal Resource Locators, or URLs,
used on the World Wide Web. This element should be used to refer to
other documents, files available for FTP, and so forth. For example,
You can get the Linux HOWTO documents from
<url url="http://sunsite.unc.edu/mdw/linux.html"
name="the Linux Documentation Project home page">.
The url argument specifies the actual URL itself. A link to the URL in
question will be automatically added to the HTML document. The
optional name argument specifies the text that should be anchored to
the URL (for HTML conversion) or named as the description of the URL
(for LaTeX and nroff). If no name argument is given, the URL itself
will be used.
For example, you can get the Linuxdoc-SGML package from
(ftp://ftp.cs.cornell.edu/mdw/linuxdoc-sgml-1.1.tar.gz).
3.6. Fonts
Essentially, the same fonts supported by LaTeX are supported by
linuxdoc-sgml. Note, however, that the conversion to plain ASCII
(through groff) does away with the font information---I might hack up
plain-ASCII representations of the various fonts if the need arises.
So, you should use fonts as much as possible, for the benefit of the
conversion to LaTeX. But don't depend on the fonts to get a point
across in the plain ASCII version.
In particular, the tt tag described above can be used to get constant-
width ``typewriter'' font which should be used for all e-mail
addresses, machine names, filenames, and so on. Example:
Here is some <tt>typewriter text</tt> to be included in the document.
Equivalently:
Here is some <tt/typewriter text/ to be included in the document.
Remember that you can only use this abbreviated form if the enclosed
text doesn't contain slashes.
Other fonts can be achieved with bf for boldface and em for italics.
Several other fonts are supported as well, but I don't suggest you use
them, because we'll be converting these documents to other formats
such as HTML which may not support them. Boldface, typewriter, and
italics should be all that you need.
3.7. Lists
There are various kinds of supported lists. They are:
o itemize for bulleted lists such as this one.
o enum for numbered lists.
o descrip for ``descriptive'' lists.
Each item in an itemize or enum list must be marked with an item
tag. Items in a descrip are marked with tag. For example,
<itemize>
<item>Here is an item.
<item>Here is a second item.
</itemize>
Looks like this:
o Here is an item.
o Here is a second item.
Or, for an enum,
<enum>
<item>Here is the first item.
<item>Here is the second item.
</enum>
You get the idea. Lists can be nested as well; see the example docu-
ment for details.
A descrip list is slightly different, and slightly ugly, but you might
want to use it for some situations:
<descrip>
<tag/Gnats./ Annoying little bugs that fly into your cooling fan.
<tag/Gnus./ Annoying little bugs that run on your CPU.
</descrip>
ends up looking like:
Gnats.
Annoying little bugs that fly into your cooling fan.
Gnus.
Annoying little bugs that run on your CPU.
3.8. Miscellany
There are various other esoteric features in the system as well, most
of which you probably won't use. If you're curious, read the QWERTZ
User's Guide (from ftp.cs.cornell.edu in pub/mdw/SGML). QWERTZ (and
hence, linuxdoc) supports many features such as mathematical formulae,
tables, figures, and so forth. I don't recommend using most of these
features in the Linux HOWTOs because they won't render well in plain
ASCII. If you'd like to write general documentation in SGML, I suggest
using the original QWERTZ DTD instead of the hacked-up linuxdoc DTD,
which I've modified for use particularly by the Linux HOWTOs and other
documentation.
The bottom line is, linuxdoc-sgml supports many other features found
in the QWERTZ DTD, but I haven't necessarily tweaked them to work well
with linuxdoc-sgml. If you encounter problems with any of them, please
let me know.
4. The Linux HOWTO project
How does this tie into writing HOWTOs? First of all, I'd like to see
everyone eventually convert their HOWTOs to SGML using this DTD. This
has a number of advantages. First of all, it will allow you to just
send me the SGML source, which I'll convert to plain ASCII, TeX,
whatever, for posting and archiving. Also, it will give the HOWTOs a
common look and feel; any changes that I make to the DTD will be
reflected in all of the HOWTOs.
I have set up the linuxdoc DTD to have a certain look and feel. If
you want your document to look differently, please let me know,
because I'll need to make those changes in the DTD itself. That is, do
not modify your version of the DTD or replacement files to get other
features in the system. We all must use the same DTD and replacement
files or this whole system will break down. If you find bugs in it, or
have suggestions for how we can change thing or add/modify features,
let me know. I'll be more than happy to accomodate you.